LRD: Latent Relation Discovery for Vector Space Expansion and Information Retrieval

نویسندگان

  • Alexandre L. Gonçalves
  • Jianhan Zhu
  • Dawei Song
  • Victoria S. Uren
  • Roberto Carlos dos Santos Pacheco
چکیده

In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SMART Electronic Legal Discovery Via Topic Modeling

Electronic discovery is an interesting sub problem of information retrieval in which one identifies documents that are potentially relevant to issues and facts of a legal case from an electronically stored document collection (a corpus). In this paper, we consider representing documents in a topic space using the well-known topic models such as latent Dirichlet allocation and latent semantic in...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Evaluation of Vector Space Models for Medical Disorders Information Retrieval

Nowadays, consumers often search online to seek medical and health care information that they need. To improve this access, the ShARe/CLEF eHealth Evaluation Lab (SHEL) organized a shared task on information retrieval for Medical Disorders in 2013. This paper describes our participation in this task. In order to detect latent semantic relevance between queries and webpages about disorders, a se...

متن کامل

On-Demand Index for Efficient Structural Joins

An efficient indexing scheme for moving objects' trajectories on road networks p. 13 Spatial index compression for location-based services based on a MBR semi-approximation scheme p. 26 KCAM : concentrating on structural similarity for XML fragments p. 36 A new structure for accelerating XPath location steps p. 49 Efficient evaluation of multiple queries on streamed XML fragments p. 61 Automate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006